Datasets used in adversarial research on CodeLMs

Dataset Year Programming Language Data Source Download Link
BigCloneBench 2014 Java GitHub Download
OJ dataset 2016 C++ OJ Platform Download
CodeSearchNet 2019 Go
Java
JavaScript
PHP
Python
Ruby
GitHub Download
Code2Seq 2019 Java GitHub Download
Devign 2019 Java GitHub Download
Google Code Jam (GCJ) 2020 C++
Java
OJ Platform Download
CodeXGLUE 2021 Go
Java
JavaScript
PHP
Python
Ruby
GitHub Download
CodeQA 2021 Java
Python
GitHub Download
APPS 2021 Python OJ Platform Download
Shellcode_IA32 2021 assembly language instruction OJ Platform Download
SecurityEval 2022 Python GitHub Download
LLMSecEval 2023 Python
C
GitHub Download
PoisonPy 2023 Python GitHub not yet published

A summary of target models of adversarial attacks in CodeLMs

Attack Technique Year Venue Attack Type Target Models Target Tasks
Quiring et al. 2019 USENIX Security Black-box Attack Random Forest
LSTM
Authorship Attribution
DAMP 2020 OOPSLA White-box Attack Code2Ve
GGNN
Method Name Prediction
Variable Name Prediction
STRATA 2020 Arxiv Black-box Attack Code2Seq Method Name Prediction
MHM 2020 AAAI Black-box Attack BiLSTM
ASTNN
Function Classification
Srikant et al. 2021 ICLR White-box Attack Seq2Seq Method Name Prediction